{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Numpy非常重要有用的数组合并操作"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "背景：在给机器学习准备数据的过程中，经常需要进行不同来源的数据合并的操作。\n",
    "\n",
    "两类场景：\n",
    "1. 给已有的数据添加多行，比如增添一些样本数据进去；\n",
    "2. 给已有的数据添加多列，比如增添一些特征进去；\n",
    "\n",
    "以下操作均可以实现数组合并：\n",
    "* np.concatenate(array_list, axis=0/1）：沿着指定axis进行数组的合并\n",
    "* np.vstack或者np.row_stack(array_list)：垂直vertically、按行row wise进行数据合并\n",
    "* np.hstack或者np.column_stack(array_list)：水平horizontally、按列column wise进行数据合并"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. 怎样给数据添加新的多行"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(6).reshape(2,3)\n",
    "b = np.random.randint(10,20,size=(4,3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[0, 1, 2],\n",
       "       [3, 4, 5]])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[13, 16, 13],\n",
       "       [17, 15, 14],\n",
       "       [19, 13, 19],\n",
       "       [10, 13, 10]])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2],\n",
       "       [ 3,  4,  5],\n",
       "       [13, 16, 13],\n",
       "       [17, 15, 14],\n",
       "       [19, 13, 19],\n",
       "       [10, 13, 10]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法1：\n",
    "np.concatenate([a,b])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2],\n",
       "       [ 3,  4,  5],\n",
       "       [13, 16, 13],\n",
       "       [17, 15, 14],\n",
       "       [19, 13, 19],\n",
       "       [10, 13, 10]])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法2\n",
    "np.vstack([a,b])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2],\n",
       "       [ 3,  4,  5],\n",
       "       [13, 16, 13],\n",
       "       [17, 15, 14],\n",
       "       [19, 13, 19],\n",
       "       [10, 13, 10]])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法3\n",
    "np.row_stack([a, b])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. 怎样给数据添加新的多列"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = np.arange(12).reshape(3,4)\n",
    "b = np.random.randint(10,20,size=(3,2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3],\n",
       "       [ 4,  5,  6,  7],\n",
       "       [ 8,  9, 10, 11]])"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[12, 16],\n",
       "       [18, 12],\n",
       "       [12, 12]])"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([[ 0,  1,  2,  3, 12, 16],\n",
       "       [ 4,  5,  6,  7, 18, 12],\n",
       "       [ 8,  9, 10, 11, 12, 12]])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 方法1\n",
    "np.concatenate([a,b], axis=1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 方法2\n",
    "np.hstack([a,b])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 方法3\n",
    "np.column_stack([a,b])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
